Sequence Comparisons via Algorithmic Mutual Information
نویسنده
چکیده
One of the main problems in DNA and protein sequence comparisons is to decide whether observed similarity of two sequences should be explained by their relatedness or by mere presence of some shared internal structure, e.g., shared internal tandem repeats. The standard methods that are based on statistics or classical information theory can be used to discover either internal structure or mutual sequence similarity, but cannot take into account both. Consequently, currently used methods for sequence comparison employ "masking" techniques that simply eliminate sequences that exhibit internal repetitive structure prior to sequence comparisons. The "masking" approach precludes discovery of homologous sequences of moderate or low complexity, which abound at both DNA and protein levels. As a solution to this problem, we propose a general method that is based on algorithmic information theory and minimal length encoding. We show that algorithmic mutual information factors out the sequence similarity that is due to shared internal structure and thus enables discovery of truly related sequences. We extend that recently developed algorithmic significance method (Milosavljević & Jurka 1993) to show that significance depends exponentially on algorithmic mutual information.
منابع مشابه
Probabilistic Sufficiency and Algorithmic Sufficiency from the point of view of Information Theory
Given the importance of Markov chains in information theory, the definition of conditional probability for these random processes can also be defined in terms of mutual information. In this paper, the relationship between the concept of sufficiency and Markov chains from the perspective of information theory and the relationship between probabilistic sufficiency and algorithmic sufficien...
متن کاملMutual Dimension and Random Sequences
If S and T are infinite sequences over a finite alphabet, then the lower and upper mutual dimensions mdim(S : T ) and Mdim(S : T ) are the upper and lower densities of the algorithmic information that is shared by S and T . In this paper we investigate the relationships between mutual dimension and coupled randomness, which is the algorithmic randomness of two sequences R1 and R2 with respect t...
متن کاملStandardized Mutual Information for Clustering Comparisons: One Step Further in Adjustment for Chance
Mutual information is a very popular measure for comparing clusterings. Previous work has shown that it is beneficial to make an adjustment for chance to this measure, by subtracting an expected value and normalizing via an upper bound. This yields the constant baseline property that enhances intuitiveness. In this paper, we argue that a further type of statistical adjustment for the mutual inf...
متن کاملA New Entropy Based Model for the Detection of Correlated Mutations in Multiple Sequence Alignments
The recent advents of complete genome sequencing provide a tremendous amount of data for researches about the structural basis of the function of proteins. However, the shear amount of data is both a blessing and a curse. In order to facilitate the utilization of this information, numerous algorithmic analysis procedures have been developed to identify functionally important residues. In this p...
متن کاملAn operational characterization of mutual information in algorithmic information theory
We show that the mutual information, in the sense of Kolmogorov complexity, of any pair of strings x and y is equal, up to logarithmic precision, to the length of the longest shared secret key that two parties, one having x and the complexity profile of the pair and the other one having y and the complexity profile of the pair, can establish via a probabilistic protocol with interaction on a pu...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Proceedings. International Conference on Intelligent Systems for Molecular Biology
دوره 2 شماره
صفحات -
تاریخ انتشار 1994